Programmatic video generation using React, Remotion, and Temporal Authority.
Track 03 · Media. A developer starter kit for rendering 9:16 vertical MP4s from text briefs. The core engine runs script agents to draft scenes, calls edge TTS to generate speech audio, measures files via ffprobe, and injects exact frame durations into Remotion templates. Extracted from production Agentic OS.
Building programmatic video by async-waiting inside React components is a recipe for broken layouts. React is built for rendering UI state, not coordinating multi-second media buffers. If you attempt to fetch speech tracks or calculate text animations on the fly inside Remotion compositions, you trigger timing errors. When the video renders, overlays drift out of sync with the audio track.
Programmatic video requires a **Temporal Authority**. Programmatic timing decisions must be made before React starts compiling frames. The Node.js orchestrator must generate the assets, query their durations in milliseconds using CLI utilities, convert those durations into integers based on your target framerate, and feed those integers into React props. React acts as a flat renderer, reading numbers without doing timing math.
ffprobe to read the exact audio duration down to the millisecond. It multiplies the seconds by the target framerate (e.g. 30 fps) and rounds up to get an integer frame duration.
The pipeline targets free developer keys and APIs to allow rendering without runtime costs:
| Step | API / Engine | Price |
|---|---|---|
| LLM Scripting | NVIDIA NIM Developer Console → Google Gemini Free API | $0.00 (Free tiers) |
| Text-to-Speech | Edge TTS (Reverse engineered Microsoft speech endpoint) | $0.00 (No key required) |
| Image Generation | Pollinations.ai (Flux & SDXL wrappers) | $0.00 (No key required) |
| Video Compilation | Remotion CLI + FFmpeg + ffprobe package | $0.00 (Local compiler) |
pipeline/run.mjs: The coordinator. Calls script, voice, and visual steps, writes output data files, and fires the Remotion compiler.pipeline/llm-router.mjs: RESILIENCY. Fallback router that skips down the line of API keys if endpoints fail or are rate-limited.pipeline/temporal-authority.mjs: Executes the ffprobe child process to read audio length and convert to frame integers.src/Root.tsx: Declares composition structures and coordinates frame props inside Remotion.git clone https://github.com/shubham0086/video-engine-starter cd video-engine-starter npm install cp .env.example .env # Compile a video on Deep Sleep node pipeline/run.mjs "The science of deep sleep" # Open Remotion Studio to preview npx remotion studio # Compile video file to MP4 npx remotion render BasicReel out/deep-sleep.mp4
Video-Engine-Starter is the **programmatic media output** layer. It acts as Stage 4 of the Agentic OS video automation pipeline:
Brief Intake → Research → Outlines → [Video Engine Starter] → Publishing QA → Distribution
This is a developer starter kit, not a plug-and-play SaaS system. It renders basic slides, text layouts, captions, and static images. If you require advanced features like keyframe animations, audio filters, custom transitions, or multi-track audio layering, you will need to write custom React Remotion templates.プログラムによる動画構築のための基盤です。